This document was created from an R markdown file. The repository for the project can be found here. The data reported in the paper can be explored interactively at the Metalab website.

1 Details of calculating effect size

To standardize the effect size calculation, we converted some reported raw results to the proportion of correct responses. For looking time studies, when the paper only reported the raw looking time in seconds, we calculated the proportion of correct response by dividing the mean looking time toward the matching scene by the sum of looking time toward the matching scenes and non-matching scenes (i.e., excluding the look away time from the denominator). The raw standard deviations were also converted to the corresponding values by being divided by the sum.

Below is a step-by-step example calculation using data in Yuan & Fisher (2009) Experiment. The table presents raw data from Yuan & Fisher (2009, pg 622) Table 1. The values are Mean looking time in seconds, and in parentheses are SE.

Dialogue Type Sample Size Two-participant Event One-participant Event
Transitive 8 4.82 (0.43) 2.87 (0.51)
Intransitive 8 3.33 (0.24) 4.12 (0.40)

When the paper only provides raw looking time data, we converted the data into proportion of correct looking time and the variances following the formulae below. For children hearing transitive sentences, the correct scene was the Two-participant Event. For children hearing intransitive sentences, the correct scene was the One-participant Event. Standard Deviations were calculated by scaling the raw SE first, and then multiplied by the square roots of the number of participants."

\[\begin{aligned} Mean_{Proportion} &= \frac{Time_{correct}}{Time_{correct} + Time_{incorrect}} \\ SD_{Proportion} &= \frac{SE_{Raw}}{Time_{correct} + Time_{incorrect}} * \sqrt[2]{N}\\ \\ \\ \end{aligned}\] \[\begin{aligned} Mean_{transitive} &= \frac{Time_{correct}}{Time_{correct} + Time_{incorrect}} \\ &= \frac{4.82}{4.82 + 2.87} \\ &= 0.627 \\ SD_{transitive} &= \frac{SE_{Raw}}{Time_{correct} + Time_{incorrect}} * \sqrt[2]{N} \\ &= \frac{0.43}{4.82 + 2.87} * \sqrt[2]{8} \\ &= 0.158 \\ \\ \\ \end{aligned}\] \[\begin{aligned} Mean_{intransitive} &= \frac{Time_{correct}}{Time_{correct} + Time_{incorrect}} \\ &= \frac{4.12}{3.33 + 4.12} \\ &= 0.553 \\ SD_{intransitive} &= \frac{SE_{Raw}}{Time_{correct} + Time_{incorrect}} * \sqrt[2]{N} \\ &= \frac{0.4}{3.33 + 4.12} * \sqrt[2]{8} \\ &= 0.152 \\ \\ \\ \end{aligned}\]

Using the standardized data as presented in the table below, we calculate Cohen’s d and the variances as follows (the implementation of the script can be found at XXX)

Dialogue Type Sample Size Mean Proportion Standard Deviation
Transitive 8 0.627 0.158
Intransitive 8 0.553 0.152
\[\begin{aligned} d_{transitive} &= \frac{M_1 - M_2}{\sigma_{pooled}} \\ &= \frac{M_{correct} - M_{chance}}{\sigma_{correct}} \\ &= \frac{0.627 - 0.5}{0.158} \\ &\approx 0.79 \\ \\ \\ d_{intransitive} &= \frac{M_1 - M_2}{\sigma_{pooled}} \\ &= \frac{M_{correct} - M_{chance}}{\sigma_{correct}} \\ &= \frac{0.553 - 0.5}{0.152} \\ &\approx 0.35 \end{aligned}\] \[\begin{aligned} var(d_{transitive}) &= \frac{1}{N} + \frac{d^2}{2 * N} \\ &= \frac{1}{8} + \frac{0.79^2}{2 * 8} \\ &\approx 0.16 \\ var(d_{intransitive}) &= \frac{1}{N} + \frac{d^2}{2 * N} \\ &= \frac{1}{8} + \frac{0.35^2}{2 * 8} \\ &\approx 0.13 \\ \end{aligned}\]

2 Comparison between between subjects effect sizes and within subjects effect size

The forest plot below compares the two ways of calculating effect sizes, using a subset of experimental conditions. The subset was chosen because in the original paper the main analyses presented for these conditions were between-group calculation. In other words, the original analyses compared the porportion of looking time at the causative events between transitive conditions and intransitive conditions. Effect sizes calculated using these methods were denoted by the red dots. We also presented the effect sizes calculated using the against-chance method on the same subset of the experimental conditions. These effect sizes are denoted with black dots. The against-chance method is a more conservative way of estimating the effect size. As the forest plot shows, the meta-analytic effect size using the between-group calculation is larger than the meta-analytic effect size using the against-chance method.

3 Sensitivity analysis

The plot below shows a modified funnel plot, or “significance funnel” where significant studies are shown in orange and non-significant studies are shown in grey (Marthur & VanderWeele, 2020). The x-axis shows effect size estimates, and the y-axis shows estimated standard error for each estimate. Studies lying on the grey line have a p-value of .05. The black diamond shows the meta-analytic effect size estimate for all studies; the grey diamond shows the meta-analytic effect size estimate for significant studies only (the “worst-case” publication scenario). Note that the worst case scenario appreciable attenuates the effect size estimate, but does not attenuate the point estimate to 0 (worst case estimate: 0.08 [-0.1, 0.25]).

4 Main model results

The tables below show the estimates for the single-moderator models reported in the main text. Across all the single-predictor model, the predicate type is significant, such that hearing transitive sentences have a positive effect on the effect size. We also found that median vocabulary size is a marginally significant moderator.

In the tables throughout the supplemental information, we reported the point estimates for the parameters and their 95% confidence intervals in square brackets (i.e., [lower bound, upper bound].) For estimates that reaches a p-value of 0.05, we put an asterisk (*) near the number. For categorical variables, the base levels are represented as the first ones appeared in the parentheses.

4.1 Mean age

Parameter Estimate z value p value
Intercept 0.6 [0.07, 1.13] 2.23 0.03*
Mean Age (months) -0.01 [-0.03, <.001] -1.47 0.14

4.2 Median productive vocabulary size

Parameter Estimate z value p value
Intercept 0.66 [0.04, 1.28] 2.08 0.04*
Median productive vocabulary size -0.01 [-0.02, <.001] -1.74 0.08

4.3 Predicate Type

Parameter Estimate z value p value
Intercept 0.1 [-0.14, 0.34] 0.8 0.42
Predicate type (Transitive / Intransitive) 0.24 [0.02, 0.46] 2.1 0.04*

4.4 Noun phrase type

Parameter Estimate z value p value
Intercept 0.26 [-0.02, 0.53] 1.83 0.07
Noun phrase type (Pronoun / Noun) -0.04 [-0.4, 0.31] -0.24 0.81

4.5 Character identification phase

Parameter Estimate z value p value
Intercept 0.19 [-0.06, 0.43] 1.51 0.13
Character identification phase (Yes / No) 0.18 [-0.28, 0.64] 0.75 0.45

4.6 Practice phase

Parameter Estimate z value p value
Intercept 0.35 [0.09, 0.62] 2.59 0.01*
Practice phase (Yes / No) -0.21 [-0.5, 0.09] -1.35 0.18

4.7 Synchronicity

Parameter Estimate z value p value
Intercept 0.2 [-0.05, 0.44] 1.59 0.11
Synchronicity (Simultaneous / Asynchronous) 0.1 [-0.22, 0.41] 0.59 0.55

4.8 Testing structure

Parameter Estimate z value p value
Intercept 0.13 [-0.1, 0.35] 1.1 0.27
Testing Procedure Structure (Mass / Distributed) 0.37 [-0.05, 0.79] 1.75 0.08

4.9 Number of sentence repetitions

Parameter Estimate z value p value
Intercept 0.18 [-0.12, 0.48] 1.18 0.24
Number of sentence repetitions 0.01 [-0.02, 0.03] 0.52 0.6

5 Main models results using dataset without imputed values

As mentioned in the method sections, for studies missing relevant statistics, we imputed values from studies with similar design (e.g. Hirsh-Pasek, Golinkoff,& Naigles, 1996). The tables below report the model results from fitting the exact same models on the dataset excluding the imputed study. There was no significant difference between the outcomes from the two datasets.

5.1 Mean age

Parameter Estimate z value p value
Intercept 0.6 [0.06, 1.13] 2.2 0.03*
Mean Age (months) -0.01 [-0.03, <.001] -1.47 0.14

5.2 Median productive vocabulary size

Parameter Estimate z value p value
Intercept 0.66 [0.04, 1.28] 2.08 0.04*
Median productive vocabulary size -0.01 [-0.02, <.001] -1.74 0.08

5.3 Predicate Type

Parameter Estimate z value p value
Intercept 0.1 [-0.15, 0.35] 0.76 0.45
Predicate type (Transitive / Intransitive) 0.24 [0.01, 0.46] 2.08 0.04*

5.4 Noun phrase type

Parameter Estimate z value p value
Intercept 0.25 [-0.04, 0.54] 1.68 0.09
Noun phrase type (Pronoun / Noun) -0.04 [-0.4, 0.33] -0.2 0.84

5.5 Character identification phase

Parameter Estimate z value p value
Intercept 0.17 [-0.08, 0.43] 1.34 0.18
Character identification phase (Yes / No) 0.19 [-0.28, 0.66] 0.78 0.43

5.6 Practice phase

Parameter Estimate z value p value
Intercept 0.35 [0.07, 0.63] 2.44 0.01*
Practice phase (Yes / No) -0.21 [-0.51, 0.1] -1.31 0.19

5.7 Synchronicity

Parameter Estimate z value p value
Intercept 0.2 [-0.05, 0.44] 1.54 0.12
Synchronicity (Simultaneous / Asynchronous) 0.09 [-0.24, 0.41] 0.53 0.59

5.8 Testing structure

Parameter Estimate z value p value
Intercept 0.11 [-0.12, 0.35] 0.93 0.35
Testing Procedure Structure (Mass / Distributed) 0.39 [-0.04, 0.82] 1.78 0.07

5.9 Number of sentence repetitions

Parameter Estimate z value p value
Intercept 0.17 [-0.14, 0.48] 1.05 0.29
Number of sentence repetitions 0.01 [-0.02, 0.03] 0.55 0.58

6 Models with Methodological Moderators and Theoretical Moderators

Syntactic Bootstrapping studies differ in their implementational details. Here we examine to what extent the influences of the theoretical moderators can be accounted for by the methodological factors. The tables below present the results of models that include all the key methodological moderators and one of the theoretical moderators. The patterns were consistent with the single-predictor theoretical models: predicate type is still a significant predictor of the effect.

In the tables below we highlight the rows representing theoretical moderators in yellow.

6.1 With age

Parameter Estimates z value p value
Intercept -0.03 [-0.93, 0.88] -0.06 0.95
Character identification phase (No / Yes) 0.24 [-0.3, 0.78] 0.88 0.38
Practice phase (No / Yes) -0.11 [-0.46, 0.24] -0.62 0.54
Stimuli synchronicity (Asynchronous / Simultaneous) 0.29 [-0.26, 0.83] 1.02 0.31
Testing structure (Distributed / Mass) 0.47 [0.02, 0.92] 2.07 0.04*
Number of sentence repetitions 0.02 [-0.02, 0.06] 0.93 0.35
Mean age (months) -0.01 [-0.03, 0.02] -0.53 0.59

6.2 With productive vocabulary size

Parameter Estimates z value p value
Intercept -0.24 [-2.19, 1.72] -0.24 0.81
Character identification phase (No / Yes) 0.33 [-0.76, 1.43] 0.60 0.55
Practice phase (No / Yes) 0.01 [-1.05, 1.07] 0.02 0.98
Stimuli synchronicity (Asynchronous / Simultaneous) 0.17 [-1.14, 1.48] 0.25 0.8
Testing structure (Distributed / Mass) 0.95 [0.23, 1.67] 2.59 0.01*
Number of sentence repetitions 0.01 [-0.1, 0.12] 0.14 0.89
Median productive vocabulary size -0.01 [-0.03, 0.01] -0.74 0.46

6.3 With predicate type

Parameter Estimates z value p value
Intercept -0.4 [-1.01, 0.21] -1.29 0.2
Character identification phase (No / Yes) 0.27 [-0.25, 0.79] 1.03 0.3
Practice phase (No / Yes) -0.08 [-0.38, 0.22] -0.51 0.61
Stimuli synchronicity (Asynchronous / Simultaneous) 0.29 [-0.22, 0.8] 1.11 0.27
Testing structure (Distributed / Mass) 0.54 [0.09, 0.99] 2.34 0.02*
Number of sentence repetitions 0.02 [-0.01, 0.06] 1.19 0.23
Predicate type (Intransitive / Transitive) 0.24 [>-.001, 0.47] 1.98 0.05*

6.4 With Noun phrase type

Parameter Estimates z value p value
Intercept -0.26 [-0.98, 0.45] -0.73 0.47
Character identification phase (No / Yes) 0.21 [-0.32, 0.73] 0.77 0.44
Practice phase (No / Yes) -0.17 [-0.5, 0.16] -1.03 0.31
Stimuli synchronicity (Asynchronous / Simultaneous) 0.37 [-0.28, 1.03] 1.12 0.26
Testing structure (Distributed / Mass) 0.44 [-0.1, 0.98] 1.61 0.11
Number of sentence repetitions 0.02 [-0.01, 0.06] 1.23 0.22
Noun phrase type (Noun / Pronoun) 0.07 [-0.5, 0.65] 0.25 0.8

7 Additional Moderators

7.1 Relationship between moderators

We coded additional moderators, including the modality of, the actors in and the event types in the visual stimuli. Stimuli modality has two levels: videos and animations. We coded this moderator following the details provided in the method sections of the papers. Stimuli actors have two levels, human actors and non-human actors. Studies using visual stimuli with human actors wearing animal suits were coded as using non-human actors. To capture the event types of the visual stimuli, we coded the transitive action stimuli and the intransitive action stimuli separately. The transitive event has two levels: direct caused action and indirect caused action. The event was coded as using direct caused action if the agent in the action directly acted upon the patient. It was coded as using indirect caused action if the agent caused the patient to move via another medium. For example, the agent may pull a band on the patient’s waist causing her to move. Likewise, the intransitive event also has two levels: one action versus parallel actions. Here we coded the levels by number of participants presented on the screen. An intransitive event was coded as “one action” if and only if there was only one agent presented on the screen. If an event involves more than one actor in the intransitive event (e.g. two actors doing parallel actions or one actor with one stander-by), then the event was coded as parallel-actions.

These additional moderators were not included in the main analyses because of their close relationships between each other and with the main moderators. The heatmaps below showed the overlappings between moderators. Each cell corresponds to the co-occurrence between two moderator levels. Brighter colors indicate a higher frequency of co-occurrence, and darker colors indicate lower frequency. You can hover your mouse on the heatmap to see the corresponding value and combination of each cell.

7.1.1 Ordered by Row Average

7.1.2 Ordered by groups

7.2 Model results

The tables here present some exploratory moderators. The base levels for the categorical moderators are the first ones in the parentheses.

7.2.1 Patient argument type for transitive sentence

In the main analysis, we presented the results of the model for the relationship between effect size and the agent argument type. We found that having nouns or pronouns int he agent argument does not significantly predict the effect size. Here, we presented a similar analysis of the influence of the patient argument type. Because by definition English intransitive sentences do not have patient argument, we focus on the subset of studies that used the transitive sentences (\(N\) = 30)

Parameter Estimate z value p value
Intercept 0.35 [0.08, 0.62] 2.56 0.01*
Patient Argument Type (Noun / Pronoun) -0.17 [-0.56, 0.22] -0.86 0.39

7.2.2 Stimuli Modality

We found that the presentation modality of the stimuli was not a significant predictor of the effect size. In other words, studies that presented young children with animation clips had similar effect sizes as studies using video clips. The model statistics are shown below. Note that the stimuli modality and the stimuli actor levels had a lot of overlapping studies, so researchers should interpret this result with caution.

Parameter Estimate z value p value
Intercept 0.59 [0.12, 1.06] 2.45 0.01*
Stimuli Modality (Animation / Video) -0.37 [-0.82, 0.08] -1.63 0.1

7.2.3 Stimuli actors

There is a marginal effect of stimuli actor. Studies with human actors as protagonists in the events had relatively smaller effect sizes as studies using puppets, human actors in animal suits, or using animated geometrical shapes. This might due to the relatively higher visual complexity associated with stimuli using real human actors.

Parameter Estimate z value p value
Intercept 0.43 [0.13, 0.72] 2.85 <.001*
Stimuli Actor (Non-person / Person) -0.29 [-0.62, 0.03] -1.75 0.08

7.2.4 Type of event

Studies differed in the type of transitive events and intransitive events they presented. Previous studies have shown that young children’s looking behaviors in Inter-modal Preferential Looking Paradigm were very sensitive to the subtle perceptual differences in the visual stimuli (Delle Luche, Durrant, Poltrock, & Floccia, 2015; Fernald, Zangl, Portillo, & Marchman, 2008). Therefore, we coded the types of events presented in the visual stimuli. There were two types of transitive events: direct causal action and indirect causal action. The former involved the agent directly acting on the patient and causing the patient to move. The latter involved a mean-end sequence leading to the caused action of the patient. For example, the agent may pull a band on the patient’s waist and caused it to move. There were also two types of intransitive events used in the literature. One involved a single actor acting, such as jumping up and down. The other involved two actors presented without any causal action.

Our model suggested that neither of the variables was predictive of the effect sizes.

7.2.4.1 Transitive Event type

Parameter Estimate z value p value
Intercept 0.24 [0.02, 0.46] 2.13 0.03*
Transitive Event Type (Direct caused action / Indirect caused action) -0.02 [-0.41, 0.37] -0.1 0.92

7.2.4.2 Intransitive event type

Parameter Estimate z value p value
Intercept 0.31 [-0.04, 0.65] 1.76 0.08
Intransitive Event Type (One action / Parallel actions) -0.09 [-0.43, 0.25] -0.51 0.61

8 Variability in visual stimuli as a function of age

There was some evidence for researchers adapting the level of visual complexity in the visual stimuli according to children’s age. We collected the available visual stimuli from the papers and the supporting materials. Schematic illustrations of the visual stimuli were used when the actual screenshots were not provided. Screenshots of the text descriptions of the events were used when the visual stimuli were unavailable. Note that because some papers’ publishers converted to the visual stimuli to black-and-white, we decided to grayscale all visual stimuli for easier visual comparison.

It is easy to see in the plot that studies for particularly young children used significantly simpler visual stimuli. This adaptation might be partly responsible for the lack of age effect observed in our samples.

9 Power analysis

We conducted a power analysis using the pwr package. The x-axis represents the number of participants in each condition, and the y-axis represents the estimated power based on the power of current estimated meta-analytic effect size. The horizontal black dotted line represents 80% power, and the vertical black dotted lines represent the number of participants needed to reach 80% power (N = 142). The red lines represents the current power (14.36%) based on the approximate mean sample sizes (N = 14) of the conditions included in the meta-analysis.

References

Delle Luche, C., Durrant, S., Poltrock, S., & Floccia, C. (2015). A methodological investigation of the Intermodal Preferential Looking paradigm: Methods of analyses, picture selection and data rejection criteria. Infant Behavior and Development, 40, 151-172

Fernald, A., Zangl, R., Portillo, A. L., & Marchman, V. A. (2008). Looking while listening: Using eye movements to monitor spoken language. Developmental psycholinguistics: On-line methods in children’s language processing, 44, 97.

Mathur, M. B., & VanderWeele, T. J. (2020). Sensitivity analysis for publication bias in meta‐analyses. Journal of the Royal Statistical Society. Series C, Applied Statistics, 69(5), 1091.